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Abstract 



It is widely believed that computing payments needed to induce truthful bidding is somehow harder 
than simply computing the allocation. We show that the opposite is true for single-parameter domains; 
creating a randomized truthful mechanism is essentially as easy as a single call to a monotone allocation 
function. Our main result is a general procedure to take a monotone allocation rule and transform it 
(via a black-box reduction) into a randomized mechanism that is truthful in expectation and individually 
rational for every realization. Moreover, the mechanism implements the same outcome as the original 
allocation rule with probability arbitrarily close to 1, and requires evaluating that allocation rule only 
once. 

Because our reduction is simple, versatile, and general, it has many applications to mechanism design 
problems in which re-evaluating the allocation function is either burdensome or informationally impossi- 
ble. Applying our result to the multi-armed bandit problem, we obtain truthful randomized mechanisms 
whose regret matches the information-theoretic lower bound up to logarithmic factors, even though prior 
work showed this is impossible for truthful deterministic mechanisms. We also present applications 
to offline mechanism design, showing that randomization can circumvent a communication complexity 
lower bound for deterministic payments computation, and that it can also be used to create truthful short- 
est path auctions that approximate the welfare of the VCG allocation arbitrarily well, while having the 
same running time complexity as Dijkstra's algorithm. 

ACM Categories and Subject Descriptors: J.4 [Social and Behavioral Sciences]: Economics; K.4.4 
[Computers and Society]: Electronic Commerce; F.2.2 [Analysis of Algorithms and Problem Complexity]: 
Nonnumerical Algorithms and Problems 
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1 Introduction 



Algorithmic Mechanism Design studies the problem of implementing the designer's goal under compu- 
tational constraints. Multiple hurdles stand in the way for such implementation. Computing the desired 
outcome might be hard (as in the case of combinatorial auctions) or truthful payments implementing the 
goal might not exist (as when exactly minimizing the make-span in machine scheduling f^"]). Even when 
payments that will generate the right incentives do exist, finding such payments might be computationally 
costly or impossible due to online constraints. 

It is widely believed that computing payments needed to induce truthful bidding is somehow harder than 
simply computing the allocation. For example, the formula for payments in a VCG mechanism involves 
recomputing the allocation with one agent removed in order to determine that agent's payment; this seem- 
ingly increases the required amount of computation by a factor of n + 1, where n is the number of agents. 
Likewise, for truthful single-parameter mechanisms the formula for payments of a given agent includes in- 
tegrating the allocation function over this agent's bid. In some contexts with incomplete information, such 
as online pay-per-click auctions, computing these "counterfactual allocations" may actually be information- 
theoretically impossible. This calls into question the mechanism designer's ability to compute payments 
that make an allocation rule truthful, even when such payment functions ai^e known to exist. Rigorous 
lower bounds based on these observations have been established for the communication complexity 171 and 
regret HllTOl of truthful deterministic mechanisms. 

In contrast to these negative results, we show that the opposite is true for single-parameter mechanisms 
that are truthful-in-expectation (over their random seed): computing the allocation and payments is es- 
sentially as easy as a single call to a monotone allocation function. This allows for positive results that 
circumvent the lower bounds for deterministic mechanisms cited earlier. 

In more detail, our main result is a general procedure to take any monotone allocation rul^l] and transform 
it (via a black-box reduction) into a randomized mechanism that is truthful in expectation, universally ex- 
post individually rationaljl implements the same outcome as the original allocation rule with probability 
arbitrarily close to 1, and requires evaluating that allocation rule only once. Further, if the original allocation 
rule is monotone in the ex-post sense, then the randomized mechanism is truthful in the ex-post sense as 
well. Our reduction applies in both the offline and online settings, and it applies to both Bayesian and 
dominant-strategy incentive compatibility. 

Because our reduction is simple, versatile, and general, it has many applications to mechanism design 
problems in which re-evaluating the allocation function is either burdensome or informationally impossible. 
A leading problem for which only a single allocation can be evaluated is the multi-aimed bandit (MAB) 
mechanism design problem ||8][lOl. In this problem information about the state of the world is dynamically 
revealed during the allocation, so that the particular information that is revealed depends on the prior choices 
of the allocation, and in turn may impact the future choices. Simulating the allocation function on different 
inputs may therefore require information that was not revealed on the actual run. As per lUl, this lack of in- 
formation is a crucial obstacle for universally ex-post truthful MAB mechanisms: the appropriate payments 
cannot be computed unless the allocation rule is very "naive". 

Applying our reduction to the MAB problem we derive that the problem of designing truthful MAB 
mechanisms reduces to the problem of designing monotone MAB allocation rules. In particular, using a 
monotone MAB allocation rule we obtain an MAB mechanism that is truthful in expectation, in the ex- 

' An allocation rule is monotone if increasing one agent's bid while keeping all other bids the same does not decrease this agent's 
allocation. 

^ A mechanism is individually rational if an agent never loses by participating in the mechanism and bidding truthfully. Here and 
elsewhere in the paper, universally and ex-post refer to properties that hold for each realization of the random seed of the algorithm 
and nature, respectively. 
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post sense, universally ex-post individually rational, and has regret 0(T^/^) for the stochastic version of 
the problem. This regret bound matches the information-theoretic lower bound for stochastic multi-armed 
bandit problems that holds even in the absence of incentive constraints. This stands in contrast to the lower 
bound of |8J, where it was shown that universally ex-post truthful mechanisms must suffer regret Q{T^/^). 

As a by-product of our results, we obtain an unconditional separation between the power of ex-post 
truthful-in-expectation and universally ex-post truthful mechanisms, in the online setting. This complements 
the recent result of Dobzinski and Dughmi [11], which gives a separation between these two classes of 
randomized mechanisms in the offline setting, under a polynomial communication complexity constraint. 
It is worth noting that the separation in ifTTl applies to a rather unnatural problem (two-player multi-unit 
auctions in which if at least one item is allocated, then all items are allocated and each player receives at 
least one item) whereas our separation result is for a natural problem (online pay-per-click ad auctions for a 
single slot, with unknown clickthrough rates). 

Our main result also has implications for offline mechanism design. Nisan and Ronen, in their seminal 
paper on algorithmic mechanism design lITSl . cite the apparent n-fold computational overhead of comput- 
ing VCG payments and pose the open question of whether payments can be computed faster than solving 
n versions of the original problem, e.g. for VCG path auctions. Our result shows that the answer is affir- 
mative, if one adopts the truthful-in-expectation solution concept and tolerates a mechanism that outputs 
an outcome whose welfare is a (1 + e)-approximation to that of the VCG allocation, for arbitrarily small 
e > 0. Babaioff et al. Q present a social choice function / in an n-player single-parameter domain such that 
the deterministic communication complexity required for truthfully implementing / exceeds that required 
for evaluating / by a factor of n. Our result shows that no such lower bound holds when one considers 
randomized mechanisms, again allowing for a small amount of random eiTor in the allocation. 

Map of the paper. On a high-level, this paper makes three contributions. Our main result is the general 
reduction, as described above (Section |3]l. Then we have the two applications to off-line mechanism design 
(Section HJl. Finally, we consider multi-armed bandit (MAB) mechanism design (Section |5]l. Here the 
main issue is designing monotone MAB allocations, which has not been studied in the rich literature on 
MAB problems. We make several contributions in that direction, including a general theorem stating that a 
number of existing MAB algorithms give rise to monotone MAB allocations (Section lS!2l ). and a new ex-post 
monotone algorithm for the stochastic setting (Section 15.31 ). As a by-product, we obtain an unconditional 
separation between the power of ex-post truthful-in-expectation and universally ex-post truthful mechanisms 
(Section 15.41 ). Presenting our results requires a significant amount of background material on algorithmic 
mechanisms design (Section |2l) and multi-aimed bandits (Section [5TT] ). 

Related work. The characterization of truthful mechanisms for single-parameter agents, given by Myer- 
son |[T4l for single-item auctions and by Archer and Tardos 121 for a more general class of single-parameter 
problems, states that a mechanism is truthful if and only if its allocation rule is monotone and its payment 
rule charges each agent its value for the realized outcome, minus a correction term expressed as an integral 
over all types lower than the agent's declared type. (See Eq. dJ]).) Exact computation of this "Myerson inte- 
gral" may be intractable, but Archer, Papadimitriou, Talwar, and Tardos jl] developed a clever workaround: 
one can use random sampling to compute an unbiased estimator of the Myerson integral, at the cost of eval- 
uating the allocation function once more. Thus, for n agents, the allocation function must be evaluated n + 1 
times: once to determine the actual allocation, and once more per agent to determine that agent's payment. 
Our reduction relies on a generalization of the same random sampling technique, but we show how to avoid 
recomputing the allocation function when determining each agent's payment. 

As mentioned earlier, the question of whether the computational cost of computing payments is inher- 
ently greater than that of computing an allocation was raised by Nisan and Ronen f\5\ in the context of 
VCG path auctions. The most significant progress on this question to date was the communication com- 
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plexity lower bound of Babaioff et al. f7|, who constructed a single-parameter domain with a monotone so- 
cial choice function (the so-called NOT-Too-FARfc function) whose communication complexity is n times 
less than the communication complexity of any deterministic incentive-compatible mechanism implement- 
ing NOT-Too-FARfc. In contrast, our results show that no such lower bound arises when one considers 
randomized truthful-in-expectation mechanisms with arbitrarily small probability of outputting the wrong 
allocation. 

The computation of payments in online mechanism design is a central issue in the analysis of truthful 
MAB mechanisms in ||8][T0l. The analysis of such mechanisms is motivated by the problem of designing 
pay-per-click ad auctions when there is uncertainty about clickthrough rates. In the absence of incentive 
constraints, the problem of learning clickthrough rates can be modeled as a MAB problem, and it is known 
that there are algorithms whose regret (roughly speaking, the lose due to not knowing the clickthrough rates 
at the outset of the auction) is 0(\/T) where T is the number of impressions; moreover, this dependence on 
T is information-theoretically optimal. 

The main result of fSl provides a characterization of deterministic mechanisms that are truthful for 
every possible realization of the other agents' bids and the users' clicks (the characterization extends to 
randomized mechanisms that are universally truthful). The characterization is more restrictive than the 
Myerson and Archer-Tardos characterization of truthful single-parameter mechanisms, because computing 
an agent's payment requires knowing how many clicks she would have received if she had submitted a lower 
bid value, and it is typically impossible to obtain this information in the online setting, since it is impossible 
to go back into the past and allocate impressions to a different bidder for the purpose of seeing whether a user 
would have clicked on that bidder's advertisement. Using the characterization of universally truthful MAB 
mechanisms, O proves that any such mechanism must incur regret 0(T^/^). Here we show that randomized 
mechanisms, that are truthful-in-expectation and individually rational for every realization, universally ex 
post individually rational can achieve regret 0(T^/^), matching the information-theoretic lower bound for 
MAB problems in the absence of incentive constraints. 

Finally, dynamic auctions S O is another setting in which information is revealed "dynamically" (over 
time). However, while in MAB auctions all information from the agents (the bids) is submitted only once 
and then information is revealed to the mechanism by nature over time, in dynamic auctions the agents 
continuously observe private "signals" from nature and submit "actions" to the mechanism. Accordingly, 
providing the right incentives becomes much more challenging. On the other hand, existing work ||9l [H has 
focused on a fully Bayesian setting with known priors on the signals, whereas all of our results do not rely 
on priors. 

2 Preliminaries 

Single Parameter Domains. We present the single parameter model for which we apply our procedure. 
The model is very similar to the model of Archer and Tardos ||2^, yet it is slightly more general. We state the 
model is terms of values and not costs and allow the values to be both positive and negative. We also allow 
randomization by nature. All these changes are minor and do not change the fundamental characterization, 
yet are helpful to later derive our results. 

Let n be the number of agents and let = [n] be the set of agents. Each agent i ^ N has some private 
type consisting of a single parameter ti G Ti that describes the agent, and is known only to i, everything else 
is public knowledge. We assume that the domain Tj is an open subset of K which is an interval with positive 
length (possibly starting from — oo or going up to oo). Let T = Ti x T2 x ... x r„ denote the domain of 
types and let t G T denote the vector of true types. 

There is some set of outcomes O. For single-parameter domains, agents evaluate outcomes in a particular 
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way that we describe next. For each agent i £ N there is a function Oj : O — )• 5?+ specifying the allocation 
to agent i. The value of an outcome o G O for an agent i G N with type ti is ti ■ ai{o). The utility that agent 
i £ N derives from outcome o G O when he is charged pi is quasi-linear: Ui = ti ■ ai{o) — pi. 

For instance, consider the allocation of k identical units of good to agents with additive valuations: agent 
i has a value of ti per unit. An outcome o specifies how many items each agent receives, that is, aj(o) is the 
number of items i receives. His valuation for that outcome is his value per-unit times the number of units he 
receives. 

A (direct revelation) deterministic mechanism Ai consists of the pair {A, V), where ^ : T — >■ O is the 
allocation rule and "P : T — )• 5?" is the payment rule, i.e. the vector of payment functions "P^ : T — )• K for 
each agent i. Each agent is required to report a type hi G Tj to the mechanism, and hi is called the bid of 
agent i. We denote the vector of bids by 6 G T. The mechanism picks an outcome A{h) and charges agent 
i payment of Vi{h). The allocation for agent i when the bids are h is Ai{h) = ai{A{h)) and he is charged 
Vi{h). Agent z's utility when the agents bid h £ T and his type is ti G Ti is 

Uiiti,h)=ti■Ai{h)-V^{h) (1) 

We also consider randomized mechanisms, which are distribution over deterministic mechanisms. For a 
randomized allocation rule Ai{h) and Vi{h) will denote the expected allocation and payment charged from 
agent i, when the bids are h. The expectation is taken over the randomness of the mechanism. Sometimes it 
will be helpful to explicitly consider the deterministic allocation and payment that is generated for specific 
random seed, in this case we use w to denote the random seed and use Ai{h\ w) and Vi{h; w) to denote 
allocation and payment when the seed is w. 

It is possible that there is some outside randomization that influences the outcome and is not controlled 
by the mechanism. We call this randomization by nature. An example for that would be the randomness in 
the realization of clicks in sponsored search auction. With such randomization Ai{h) and Vi{h) also encap- 
sulate expectations over nature's randomization. Finally, we use the notation Ai{h; w, r) and Vi{h; w, r) to 
denote the allocation and payment charged from agent i, when the bids are 5, the mechanism random seed 
is w and nature's random seed is r. 

Allocation and Mechanism Properties. Let 6_j denote the vector of bids of all agents but agent i. We can 
now write the vector of bids as 6 = hi). Similar notation will be used for other vectors. 
We next list two central properties, truthfulness and individual rationality. 

• Mechanism M. is truthful if for every agent i truthful bidding is a dominant strategy in M.. That is, 
for every agent i, bidding tj always maximizes her utility, regardless of what the other agents bid. 
Formally, 

ti ■ Ai{h-i,ti) - Vi{h.i,ti) >ti-Ai{h)-Vi{h) (2) 
holds for every agent i £ N, type ti £ Ti, bids of others h^i G T_i and bid hi G Tj of agent i. 

• Mechanism A4 is individually rational (IR) if an agent never ends up with negative utility by partici- 
pating in the mechanism and bidding truthfully. Formally, 

ti-Aiih.i,ti)-Viih.i,ti)>0 (3) 

holds for every agent i £ N, type ti G Ti and bids of others 6_i G T_j. 

It will be helpful to establish terminology for the case that the above hold not only in expectation but 
also for specific realizations. For example, we will say that a mechanism is universally truthful if Eq. Q 
holds not only in expectation over the mechanism's randomness, but rather for every realization of that 
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randomness. In general, every property that we define is defined by some inequality, and if the inequality 
holds for every realization of the mechanism randomness we say that it holds universally, and if it holds for 
every realization of nature randomness we say that it holds ex-post. When we want to emphasize that the 
property holds only in expectation over the nature's randomness we say that it holds stochastically. 

Note that in an individually rational mechanism an agent is ensured not to incur any loss in expectation. 
That is rather unsatisfying as for some realizations the agent might suffer a huge loss. It is more desirable 
to design mechanisms that are universally ex-post individually rational, that is a truthful agent should incur 
no loss for every bids of the others and every realization of the random events (not only in expectation). 

If all types are positive, then in addition to individual rationality it is desirable that all agents are charged 
a non-negative amount; this is known as the no-positive-transfers property. Finally, the welfare of a truthful 
mechanism is defined to be the total utility • Ai{t). 

Characterization. The following characterization of truthful mechanisms, due to Archer and Tardos lO, 
is almost identical to the characterization presented by Myerson |[T4l for truthful mechanisms in the special 
case of single item auctions. The crucial property of an allocation that yields truthfulness is monotonicity, 
defined as follows: 

Definition 2.1. Allocation rule A is monotone if for every agent i £ N, bids h-i G r_j and two possible 
bids of i, hi > , we have Ai{b^i, hi) > Ai{b^i, b^). 

Recall that monotonicity of an allocation rule is also defined universally and/or ex-post. 
We next present the characterization of truthful mechanisms. In the theorem statement, the expression 
Ai{b^i,u) is interpreted to equal zero when u ^Ti. 

Theorem 2.2. / I74] |2]/ Consider a single-parameter domain. An allocation rule A admits a payment rule V 
such that the mechanism [A, V) is truthful if and only if A is monotone and moreover for each agent i and 
bid vector b it holds that Jj^^ Ai{b-i,u) du < oo. In this case the payment for each agent i must satisfy 

Vi{h) = Vfib^i) + kA^{b.„b,) - /^^ A(fe-*,n) du, (4) 

where Vi{b-i) does not depend on bi. 

A mechanism is called normalized if for each agent i and every bid vector b, zero allocation implies a 
zero payment: Ai{b) = =^ ^i(^) = 0- 

Corollary 2.3. The truthful mechanism in Theorem \2.2\ is normalized if and only ifVf{b^i) = 0, in which 
case the mechanism is also individually rational and for positive-only types (T C JR" j it moreover satisfies 
the no-positive-transfers property. 

Remark 2.4. Both Theorem 12. 2\ and Corollary \2.3\ hold in the "ex-post" sense (resp., "universal" sense), 
if Ai, Vi and Vf{b-i) are interpreted to mean the respective values for a specific random seed of nature 
(resp., mechanism). 

Note that for CoroUai'v l2.3l to apply, truthfulness and normalization must hold in the same "sense", e.g. 
both ex-post. 
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3 The Generic Procedure 



This section presents our procedure that takes any monotone allocation rule for a single-parameter domain 
and creates a randomized truthful-in-expectation mechanism that is ex-post individually rational and with 
identical outcome as the original allocation rule with high probability. The procedure uses the allocation 
rule as a "black box," calls it only once and allocates accordingly. 

We begin with an informal description of our procedure before giving its formal definition. As evidenced 
by Eq. (|4]|, the payment for agent i is a difference of two terms: the agent's reported utility (i.e., the product 
of her bid and her allocation), minus the integral of the allocation assigned to every smaller bid value. We 
charge the agent for her reported utility, and we give her a random rebate whose expectation equals the 
required integral. When integrating a function over a finite interval, an unbiased estimator of the integral 
can be obtained by sampling a uniformly random point of that interval and evaluating the function at the 
sampled point. This idea was applied, in the context of mechanism design, by Archer et al. HI. Below, we 
show how to generalize the procedure to allow for integrals over unbounded intervals, as required by Eq. (HJl. 
Using this procedure it is easy to transform any monotone allocation rule into a randomized mechanism that 
is truthful in expectation and only evaluates the allocation function ?i + 1 times: once to determine the actual 
allocation, and once more per agent to obtain an unbiased estimate of that agent's payment. 

Our main innovation is a transformation that uses the same random sampling trick, but only needs to 
evaluate the allocation function once. Assume that a parameter G (0, 1) is given. For every player, with 
probability 1 — ^, we leave their bid unchanged; with probability /i, we sample a smaller bid value. The 
allocation rule is invoked on these bids. An agent is always charged her reported value of the outcome, 
but if her bid was replaced with a smaller bid value then we refund her an amount equal to an unbiased 
estimator of the integral in Eq. scaled by l/^u to counterbalance the fact that the refund is only being 
applied with probability fi. A naive application of this plan suffers from the following defect: the random 
resampling of bids modifies the expected allocation vector, so we need to obtain an unbiased estimator of the 
integral of the modified allocation function. However, if we change our sampling procedure to obtain such 
an estimate, then this modifies the allocation function once again, so we will still be estimating the wrong 
integral! What we need is a "fixed point" of this process of redefining the sampling procedure. Below, we 
give a definition of self-resampling procedures that satisfy the requisite fixed point property, and we give two 
simple constructions of self-resampling procedures. A key feature of our definition is that a self-resampling 
procedure actually transforms a single bid into two correlated random values: one to be used in computing 
the allocation, the other to be used (together with the allocation itself) in computing payments. 

Thus, the formal description of our procedure consists of three parts: (i) a method for estimating inte- 
grals by evaluating the integrand at a randomly sampled point, (ii) the definition and construction of self- 
resampling procedures, (iii) the generic transformation that uses the foregoing two ingredients to convert 
any monotone allocation rule into a truthful-in-expectation randomized mechanism. We now specify the 
details of each of these three parts. 

3.1 Estimating integrals via random sampling 

Let / be a nonempty open interval in K (possibly with infinite endpoints) and let ghe a. function defined on 
I. Let us describe a procedure for estimating the integral fj g{z) dz by evaluating (7 at a single randomly 
sampled point of /. The procedure is well known; we describe it here for the purpose of giving a self- 
contained exposition of our algorithm. 

Theorem 3.1. Let F : I ^ [0, 1] be any strictly increasing function that is differentiable and satisfies 
inf^g/i^(z) = 0, sup^g/ -F(z) = 1. If Y is a random variable with cumulative distribution function F, 
then the expected value of g{Y) / F' (Y) is equal to Jj g{z) dz. 
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Proof. Since inf^g/ F{z) = and sup^gj = !> it follows that the random variable Y is supported 
on the entire interval /. Our assumption that F is differentiable implies that Y has a probability density 
function, namely F'{z). Thus, for any function h, the expectation of h{Y) is given by fjh{z)F'(z)dz. 
Applying this formula to the function h{z) = g{z)/F'{z) one obtains the theorem. □ 

3.2 Self-resampling procedures 

The basic ingredient of our generic transformation is a procedure for taking a bid hi and a random seed 
Wi, and producing two random numbers Xi{bi; wi), yi{bi; Wi). The mechanism will use {xi{hi;wi)}i^N for 
determining the allocation and additionally yi{bi;wi) for determining the payment it charges agent i. To 
prove that the mechanism is truthful in expectation we will require the following properties!^ 

Definition 3.2. Let I be a nonempty interval in R. A self-resampling procedure with support I and 
resampling probability ^ € (0,1) is a randomized algorithm with input bi G /, random seed Wi, and output 
Xi(bi;Wi), yi(bi;Wi) G /, that satisfies the following properties: 

1. For every fixed Wi, the functions Xi{bi; Wi), yi{bi; Wi) are non-decreasing functions ofbi. 

2. With probability 1 — /i, Xi{bi; Wi) = yi{bi]Wi) = b^. Otherwise Xi{bi]Wi) < yi{bi; Wi) < 6j. 

3. The conditional distribution ofxi{bi; Wi), given that yi{bi; Wi) =6- < bi, is the same as the uncondi- 
tional distribution of Xi{b[]Wi). In other words, 

'PT[xi{bi]Wi) < Ui I yi{bi;wi) = b'^] = Pr[x(6-; Wj) < a^], Voj < 6 • < b^. 

4. Consider the two-variable function 

F{ai,bi) = Fr:[yi{bi;wi) < a, | yi{bi;wi) < bi], 

which we will call the distribution function of the self-resampling procedure. For each bi, the function 
F{-, bi) must be differentiable and strictly increasing on the interval I n (— oo, bi). 

As it happens, it is easier to construct self-resampling procedures with support 5?+, and one such con- 
struction that we call the canonical self-resampling procedure (Algorithm [T]) forms the basis for our general 
construction. We defer further discussion of self-resampling until after we have described and analyzed the 
generic transformation. 

3.3 The generic transformation 

Suppose we are given a monotone allocation rule A and for each agent i N a. self-resampling procedure 
that has resampling probability f^i G (0, 1), support Ti, and output values /j = {xi,yi). Let Fi{ai, bi) denote 
the distribution function of the self-resampling procedure for agent i, and let Fl{ai, bi) denote the partial 
derivative 

oai 

Our generic transformation combines these ingredients into a randomized mechanism M. = AllocToMech(^, f ) 
that works as follows: 

1 . It solicits bid vector b £ T, 

''To keep the notation consistent, we state Definition l3.2l for a given agent i. Strictly speaking, tiie subscript i is not necessary. 
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Algorithm 1 The canonical self-resampling procedure. 



1: function Recursive(6i) 

2: with probability 1 — /i 

3: return 6j. 

4: else 

5: Pick b[ G [0, bi] uniformly at random. 

6: return Recursive(69. 



7: Input: bid hi E [0, cxd], parameter G (0, 1). 

8: Output: (xj, such that < Xi < i/i < hi. 

9: with probability I - fi 

10: Xi ^ 6i, Ui ^ bi. 

11: else 

12: Pick G [0, bi] uniformly at random. 
13: Xi ^ Recursive(69> Hi ^ 



2. It executes each agent's self-resampling procedure using an independent random seed Wi, to obtain 
two vectors of modified bids X = (xi(6i;u;i), . . . ,Xn{bn;Wn)) andy = tui), . . . ,yn{bn\Wn))- 

3. It then allocate^ according to ^(x). 

4. Each agent i is charged the amount bi ■ Ai{x) — Ri, where the rebate Ri is defined as 



We are now ready to present our main result: 

Theorem 3.3. Let Abe a monotone allocation rule. Suppose we are given an ensemble f of self-resampling 
procedures fi = {xi, m) for each agent i, each with resampling probability ^ G (0, 1). Then the mechanism 
M = {A, V) = AllocToMech(^, ^u, f ) has the following properties. 

(a) Ai is truthful, universally ex-post individually rational, 

(b) For n agents and any bid vector b (and any fixed random seed of nature) allocations A{b) and A{b) 
are identical with probability at least 1 — n/i. 

(c) If T = (all types are positive), and each fi is the canonical self-resampling procedure, then 
mechanism Ai is ex-post no-positive-transfers, and never pays any agent i more than bi-Ai{x)-{^ — \). 

Remark 3.4. Several remarks are in order. 

(1) The mechanism never explicitly computes the payment for agent i ( Eq. 01j)) but rather implicitly creates 
the correct expected payments through its randomization of the bids. 

(2) The mechanism only invokes the original allocation rule A once. This property is very useful when it 
is impossible to invoke the allocation rule more than once, e.g. for multi-armed bandit allocations. 

''if A itself is randomized and/or if there is randomness arising from nature, then we allocate according to A{x\ w, r) and we 
assume that w, r are independent of the random seeds Wi used in the resampling step. 




(5) 
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(3) The mechanism Ai is randomized even if A is deterministic. It is truthful in expectation over the 
randomness used by the self-resampling procedures. 

(4) If A is ex-post monotone, then A4 will be ex-post truthful. To see this, fix nature's random seed r and 
apply Theorem \3.3\ to the allocation rule Ar induced by this r. 

(5) If agents' types are positive then by part (b), the welfare of Ai is at least \ — np, times that of A 
Parameter p controls the trade-off between the loss in welfare and the size of the rebates Ri. Further 
results on bounding the welfare loss are presented in Section [ 



(6) By definition of the payment rule, the mechanism is universally ex-post normalized. We will not 
explicitly mention this property in the subsequent applications. 

Proof of Theorem [O] To prove that M. is ti'uthful, we need to prove two things: that the randomized allo- 
cation rule A is monotone, and that the expected payment rule V satisfies 

Viib) = biA^{b^i, bi) - J^:^Ai{b.i,u) du. (6) 

Recall that when we write Ai{b^i,u) without indicating the dependence on the combined random seed 
q = {wi, . . . , Wn, w, r) it means that we are referring to the unconditional expectation of Ai{b^i, u; q). 

The monotonicity of randomized allocation rule A follows from the monotonicity of A and the mono- 
tonicity property [T] in the definition of a self -resampling procedure. To prove that Vi satisfies Eq. we 
begin by recalling that the payment charged to player i is biAi{x) — Ri, where the rebate Ri is defined 
by Eq. dU. The expectation of biAi{x) is simply biAi{b^i, bi), so to conclude the proof of truthfulness we 
must show that 

nRi] = fl'^Mb-^,u)du. (7) 

Our proof of Eq. ^ begins by observing that the conditional distribution of Xi, given that yi = u < bi, 
is the same as the unconditional distribution of Xi{u\Wi), by Property 3 of a self -resampling procedure. 
Combining this with the fact that the random seed Wi is independent of {wj : j 7^ we find that the 
conditional distribution of the tuple x = (x_j,Xj), given that yi = u, is the same as the unconditional 
distribution of the vector x of modified bids that M would input into the allocation function A if the bid 
vector were {b^i,u) instead of {b^i,bi). Taking expectations, this implies that for all u < bi, v/e have 
E{Ai{x) \yi = u\= E[Ai{x)] = A{b-i,u). 

Now apply Theorem 13. II with the function g{u) = Ai{b-.i,u). Recalling that bi) is the cumulative 
distribution function of yi given that yi < bi, we apply the theorem to obtain 



Ai{b-i,u) dti = E 

-00 



Ai(b—i, yi 

F'MM) 



Vi < bi 



E 



_Mx)_ 

Fiiyuh 



Vi < bi 



= p-E[R,\y,<b,], (8) 

where the second equation follows from the equation derived at the end of the preceding paragraph, aver- 
aging over all u < bi. Observing that i?j = unless y^ < bi, an event that has probability /x, we see that 
E[i2j] = p ■ E[i?j I yi < bi]. Combined with Eq. ([8]), this estabhshes Eq. (|7]) and completes the proof that M. 
is truthful. 

Mechanism A4 is universally ex-post individually rational because agent i is never charged an amount 
greater than biAi{b; q). Part (b) follows from the union bound: the probability that Xi = bi for all i is at least 
1 — n/i. For part (c), note that by Proposition [33] the canonical self -resampling procedure has distribution 
function F{ai, bi) = ai/bi, hence Fl{yi, bi) = l/bi, for all i, yi, bi. The rebate Ri is equal either to or to 
i • ^'^^^ . = bi ■ Ai{x) ■ ^. We also charge bi ■ Ai[x) to agent i. The claimed upper bound on the amount 
paid to agent i follows by combining these two terms. □ 
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3.4 More on self-resampling procedures 



First we analyze Algorithm [TJ then we proceed to the case of general support. 

Proposition 3.5. Algorithm \J\ is a self-resampling procedure with support and resampling probability 
fi. The distribution function for this procedure is F{ai, hi) = ai/hi. 

Proof Sketch. Properties [T] and |2] in Definition 13.21 are immediate from the description of the algorithm. 
Property [3] follows from the recursive nature of the sampling procedure: the event yi{bi]Wi) = h[ < hi 
implies that the algorithm has followed the "else" branch on Line 11, and has chosen b[ in Line 12. Finally, 
the distribution function is F{ai, hi) = a^jhi since conditional on the event Wi) < hi, the distribution 
of yi{hi;Wi) is uniform in the interval [0, hi]. Property |4]follows trivially. □ 

Recall that Algorithm [T] is recursive; henceforth let's call it RECURSIVE^ for clarity. An equivalent 
explicit (non-recursive) version, called OneShot^, can be stated as follows: 

Algorithm 2 OneShot^: a non-recursive version of RECURSIVE^. 
1: Input: bid hi G [0, oo], parameter fi G (0, 1). 
2: Output: (xj, Ui) such that < Xi < yi < hi. 

3: witli probability 1 - /i 

4: Xi ^ hi, yi ^ hi. 

5: else 

6: Pick 7i, 72 G [0, 1] indep., uniformly at random. 

7: 2;i^6i-7/^ ^ 6i • max{7/^ ,72 }• 



Proposition 3.6. Recursive^! and OneShot^ generate the same output distribution: for any bid hi G 
[0, oo), the joint distribution of the pair {xi,yi) = {xi{bi;Wi),yi{bi; Wi)) is the same for both procedures. 

This equivalence (whose proof is deferred to Appendix IaI) is further used for computing the integrals in the 
proofs of Theorem l3.3f c) and Theorem 14.11 

Now, to construct a self-resampling procedure with support in an arbitrary interval /, we can use the 
following technique. Suppose /i : (0, 1] x / — > / is a two-variable function such that the partial derivatives 
dh{zi, bi)/dzi and dh{zi, hi)/dbi are well-defined and strictly positive at every point (zj, hi) G (0, 1] x /. 
Suppose furthermore that = hi and \Tdz.^{Q i^{h{zi,hi)} = inf(/) for all hi G /. Then we define 

the h-canonical self-resampling procedure {x^, y^) with support /, by specifying that 

{ x^{hi]Wi) = h{xi{l;wi),bi) 
\ y^{bi;wi) = h{yi{l;wi),hi), 

where {xi, yi) is the canonical self-resampling procedure as defined in Algorithm [T] 

Proposition 3.7. (x^, y^) as defined in Eq. ((91) is a self-resampling procedure with support I and resampling 
probability fi. The distribution function for (x^,y'l) is the unique two-variable function F{ai, hi) such that 

h{F{ai,hi),hi) = ai for all ai,hi £ I , ai < hi. (10) 

Proof. Property [T] in Definition 13.21 holds because of the monotonicity of h, Property |2] holds because 
h{l,bi) = hi for all hi, and Property |3]holds because the function h is deterministic and monotone. 
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Let Fh {ai , 6j ) and Fq (a^ , 6^ ) be the distribution functions for the /i-canonical and canonical self -resampling 
procedures, respectively. Recall that Fo(aj, 6j) = ai/bi by Proposition 13.51 Note that F{ai, hi) in Eq. (flOl ) 
is unique (and hence well-defined) by the strict monotonicity of h. 

The claim that Fh{ai, bi) = F{ai,bi) easily follows from in Eq. By definition of h it holds that 

h{yi{l,Wi), bi) < bi <J=^ yi{l,Wi) < 1. 

Therefore, letting yi = yi{l,Wi) we have 

Fh{ai, bi) = Pr[h{yi, bi) < ai \ h{yi, bi) < bi] 
= Pr[yi < F{ai,bi)\yi < 1] 
= Fo{F{ai,bi), 1) 
= F{ai,bi). 

Our assumption that h is differentiable and strictly increasing in its first argument now implies that the same 
property holds for F, which verifies Property |4] □ 

3.5 Bounds on welfare 

In this section we present bounds on the welfare obtained by our generic transformation for two interesting 
special cases: the positive-only types and the negative-only types. We consider the approximation that 
is achieved by the mechanism as a function of the approximation of the original allocation rule. As per 
Theorem 13. 3f b). the generic transformation creates a mechanism with an allocation that is identical to the 
original allocation with probability at least 1 — n/z. For positive-only types this implies a bound on the 
approximation which degrades with n, the number of agents (see Remark |33t5)). For negative-only types 
such bound does not immediately imply since the cost in the low probability event might be prohibitively 
high. For both setting, we present similar- bounds that do not degrade with n. 

Positive-only types. We first consider the generic procedure for positive-only types: T = 3?" , where 3?+ = 
(0,oo). Recall that for agents' types t £ T the social welfare of an outcome o is defined to SW{o,t) = 
^i^N^i^ii^)- optimal social welfare is OPT{t) = maxo^o SW{o,t), where O is the set of all 
feasible outcomes. (A mechanism with) an allocation rule A is a-approximate if it holds that 

a ■ ¥.[SW{A{t),t)] > OPT{t) for every t. (11) 

Theorem 3.8. Consider the setting in Theoreni \3.3\ c), so that T = W\_ and each fi is the canonical self- 
resampling procedure. If allocation rule A is a-approximate, then mechanism AllocToMech(^, ^, f) is 
a/{l — ■j^)-approximate. 

Proof. Fix a bid vector b, and let o* be the corresponding optimal allocation. Recall that our mechanism 
outputs allocation A{x), where x is the vector of randomly modified bids. As the original allocation rule 
A is a-approximate, by Eq. (fTTI) it holds that a ■ SW{A{x),x) > OPT{x). We show (see Lemma [3?T0l 
below) that 

E[xi] = ( 1 — i^TTz ) h for each agent i. (12) 
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Thus when we evaluate o* with respect to bids x we get: 

a-¥.[SW{A{x),x)] > OPT{x) 
> SW{o*,x) 

= E(i-2^) ^^«^(°*) 

= (l - 2^) OPT{b). □ 

Remark 3.9. For arbitrary self-resampling procedures fi with support 3?+, Eq. di2D can be replaced by 
IE[xj] > (1 — //) hi, which gives a slightly weaker result, namely an a/(l — ^) -approximation to the social 
welfare. 

Lemma 3.10. In the setting ofTheorem \3.8\ letting x be the vector of modified types, Eq. di2D holds. 

Proof. Let us use OneShot^ to describe the canonical self-resampling procedure. Recall that OneShot^ 
generates x-i = Xi{hi; Wi) by setting Xi = bi with probability 1 — fj., and otherwise sampling 71 uniformly at 
random in [0, 1] and outputting Xi = bi ■ ^1^^^ . Hence 

E[xi I Xi < bi] = b, ■ 7|/('-'^) d7i = bi ■ = b,.[i-^^ 

E[xi] = (1 - ^) • 6, + ^ • ¥.[xi I Xi < bi] = bi-(l- . □ 

Negative-only types. We next consider the negative-only types setting: T = where K_ = (—00, 0). 

For negative types approximation is defined with respect to the social cost, which is the negation of the 
social welfare. An algorithm is a-approximate if for every input it outputs an outcome with cost at most a 
times the optimal cost. We present an approximation bound for an /i-canonical self-resampling procedure, 
for a suitably chosen h. 

Theorem 3.11. Consider the setting in Theorem \3.3\ Assume that T = and that each fi is the h- 
canonical self-resampling procedure, where h{zi, bi) = bij ^fzl. Suppose fi € (0, ^). If allocation rule A is 

a-approximate, then mechanism AllocToMech(^, fj,, f) is a ^1 + -^^^^ -approximate. 

The proof of this theorem is almost identical to that of Theorem 13.81 and thus is omitted. The main 
modification is that Lemma [3. 101 is replaced by the following lemma: 

Lemma 3.12. In the setting ofTheorem \3.11\ letting x^ be the vector of modified types, it holds that 

E[x'l] = bi(l + j^^ for alii. 

Proof. Recall that x^ is defined by Eq. As in Lemma 13.101 we will use OneShot^ to describe the 
canonical self-resampling procedure. It follows that 

E[x^ I Xi < bi] = d7i = Si h ■ Ti""^ ^71 = bi ■ 

E[x^] = {1- ^,).bi + fi- E[xf I < b,] = 6, • (1 + . 



□ 
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4 Some Applications 



The VCG mechanism for shortest paths. The seminal paper of Nisan and Ronen ifTSl has presented the 
following question: is there a computational overhead in computing payments that will induce agents to be 
truthful, compared to the computation burden of computing the allocation. One of their examples is the VCG 
mechanism for the shortest path mechanism design problem, where a naive computation of VCG payments 
requires additional computation of n shortest path instances. Yet, an explicit payment computation is not 
the real goal, it is just a means to an end. The real goal is inducing the right incentives. Our procedure 
shows that without any overhead in computation, if we move to a randomized allocation rule and settle for 
truthfulness in expectation (and an approximately efficient allocation) one can induce the right incentives. 

Specifically, the shortest path mechanism design problem is the following. We are given a graph G = 
(y, E) and a pair of source-target nodes {vs,vt). Each agent e controls an edge e ^ E and has a cost Cg > 
if picked (thus = — Cg < and Tg = (— oo, 0) for every e). That cost is private information, known only 
to agent e. The mechanism designer's goal is to pick a path P from node Vs to node vt in the graph with 
minimal total cost, that is X^ggp Cg is minimal. Assume that there is no edge that forms a cut between Vg 
and Vt- 

The VCG mechanism is an efficient and truthful mechanism for this problem. It computes a shortest path 
P with respect to the reported costs and pays to an agent e the difference between the cost of the shortest 
path that does not contains e and the total cost shortest path excluding the cost of e. A naive implementation 
of the VCG mechanism requires computing |P| + 1 shortest path instances (where \P\ denotes the number 
of edges in path P). VCG is deterministic, truthful and efficient (1-approximation). 

Let EFF an the efficient allocation rule for the shortest path problem. We can use our general procedure to 
derive the following result (it's proof follows directly from Theorem l3.3l and the application of Theorem l3.1 II 
for EFF which is an the efficient allocation rule and thus is 1-approximation). 

Theorem 4.1. Fix any fi € (0, ^). For each agent i, let fi be the h-canonical self-resampling procedure, 
where h{zi, hi) = hi/ ^fzl. Let A4 = AllocToMech(EFF, fi, {fi}) be the mechanism created by applying 
AllocToMech() to EFF. Then Ai has the following properties: 

• It is truthful and universally individually rational. 

• It only computes one shortest paths instance. 

• It outputs a path with expected length at most ~\~ i~2ii ) ^'"^^■^ length of the shortest path. 

Remark 4.2. Recall that parameter /i controls the trade-off between approximation ratio and the rebate 
size Ri, which for a given random seed is proportional to ^. 

Communication overhead of payment computation. The paper 171 shows that there exists a monotone 
deterministic allocation rule for which the communication required for computing the allocation is factor 
0(n) less than the communication required to computing prices. This implies that inducing the con^ect 
incentives deterministically has a lai^ge overhead in communication. Assume that instead of requiring ex- 
plicit computation of payments we are satisfied with inducing the correct incentives using a randomized 
mechanism. In such case our reduction shows that the deterministic lower bound cannot be extended to 
randomized mechanisms, if we allow a small eiTor in the allocation. 

More concretely, consider a single parameter domain with types that are positive, Tj = (0, oo) (as in 111). 
For all i, use the canonical self-resampling procedure. Consider any monotone allocation rule A. We can 
apply Theorem [33] to obtain a randomized mechanism that is truthful and only executes that allocation rule 
A once (thus has no communication overhead at all) and has exactly the same allocation with probability at 
least (1 — /x)". For any e > we can find /i > such that the error probability is less than e. 
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5 Multi-armed bandit mechanisms 



In this section we apply the main result to multi-armed bandit (MAB) mechanisms: single-parameter mech- 
anisms in which the allocation rule is (essentially) an MAB algorithm parameterized by the bids. As in any 
single-parameter mechanism, agents submit their bids, then the allocation rule is run, and then the payments 
are assigned. This application showcases the full power of the main result, since in the MAB setting the 
allocation rule is only run once, and (in general) cannot be simulated as a computational routine without 
actually implementing the allocation. 

Focusing on the stochastic setting, we design truthful MAB mechanisms with the same regret guarantees 
as the best MAB algorithms such as UCBl Q. First, we prove that allocation rules derived from UCBl and 
similar MAB algorithms aie in fact monotone, and hence give rise to truthful MAB mechanisms. Second, we 
provide a new allocation rule with the same regret guarantees that is ex-post monotone, and hence gives rise 
to an ex-post truthful MAB mechanism. Third, we use this new allocation rule to obtain an unconditional 
separation between the power of randomized and deterministic ex-post truthful MAB mechanisms. 

5.1 Preliminaries: MAB mechanisms 

An MAB mechanism HI [TOl operates as follows. There are n agents. Each agent i has a private value Vi 
and submits a bid 6j. We assume that 6j, Vi G [0, 6max]> where bmax is known a priori. The allocation consists 
of T rounds, where T is the time horizon. In each round t the allocation rule chooses one of the agents, call 
it i = i{t), and observes a click reward 7r(t) G [0, 1]; the chosen agent i receives Vi 7r(t) units of utility. 
Payments are assigned after the last round of the allocation. Note that the social welfare of the mechanism 
is equal to the total value-adjusted click reward: Ylt=i ^i(t) ^(*)- 

The special case of 0- 1 click rewards corresponds to the scenario in which agents are advertisers in a 
pay-per-click auction, and choosing agent ? in a given round t means showing this agent's ad. Then the click 
reward 7r(t) is the click bit: 1 if the ad has been clicked, and otherwise. Following the web advertising 
terminology, we will say that in each round, an impression is allocated to one of the agents. 

Formally, an MAB allocation rule A is an online algorithm parameterized by n, T, ft^ax and the bids b. 
In each round it allocates the impression and observes the click reward. Absent truthfulness constraints, 
the objective is to maximize the reported welfare: Ylt=i ^i(t) '^{'t)- This formulation generalizes MAB 
algorithms: the latter are precisely MAB allocation rules with all bids set to 1. 

Given an MAB algorithm A, there is a natural way to transform it into an MAB allocation rule A. 
Namely, A runs algorithm A with modified click rewards: if agent i is chosen in round t then the click 
reward reported to A is Tr{t) = {bi/bmax) T^it)- We will say that algorithm A induces allocation rule A. 
From now on we will identify an MAB algorithm with the induced allocation rule, e.g. allocation rule UCBl 
is induced by algorithm UCBl Q. 

We will focus on the stochastic MAB setting: in all rounds t in which an agent i is chosen, the click 
reward 7r(t) is an independent random sample from some fixed distribution on [0, 1] with expectation 
Following the web advertisement terminology, we will call fii the click-through rate (CTR) of agent i. The 
CTRs are fixed, but no further information about them (such as priors) is revealed to the mechanism. 

Regret. The performance of an MAB allocation rule is quantified in terms of regret: 

R(T;b;fi) = Tmaxi[bifii] -E[Y.t=i Ht) /^^w]' 

the difference in expected click rewards between the algorithm and the benchmark: the best agent in hind- 
sight, knowing the ^Uj's. We focus on R{T) = maxi?(T; b; ^i), where the maximum is taken over all CTR 

^The exact shape of this distribution is not essential. E.g. in the advertising example 7r(t) G {0, 1}. 
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vectors jj, and all bid vectors h such that hi <1 for alH.H 

Regret guarantees from the vast literature on MAB algorithms easily translate to MAB allocation rules. 
In particular, allocation rule UCBl has regret R{T) = 0{^\JnT logT) 151, which is nearly matching the 
information-theoretically optimal regret bound 0(\/nT) fSHU. The stochastic MAB setting tends to be 
easier if the best agent is much better than the second-best one. Let us sort the agents so that h\ ix\ > 
^2 > • • • > fen Ain- The gap S of the problem instance is defined as {bi fii — 62 fJ'2)/bmax- The 5-gap 
regret Rs{T) is defined as the worst-case regret over all problem instances with gap 5. Allocation rule UCBl 
achieves Rs{T) = logT) [5| ; there is a lower bound i?5 (T) = 17(min(f logT, V^)) |[T3ll6l[T2l. 

Click realizations. A click realization is a. k x T table p in which the {i, t) entry pi{t) is the click reward 
(e.g., the click bit) that agent i receives if it is played in round t. Note that in order to fully define the 
behavior of any algorithm on all bid vectors one may need to specify all entries in the table, whereas only a 
subset thereof is revealed in any given run. We view p as a realization of nature's random seed. Thus, we 
can now define ex-post truthfulness and other ex-post properties: informally, ex-post property is a property 
that holds for every given click realization. 

For each agent i, round t, bid vector b and click realization p, let Al{b; p) (resp., Al{p)) denote the 
probability that MAB allocation rule A (resp., MAB algorithm A) allocates the impression at round t to 
agent i. 

5.2 Truthfulness and monotonicity 

Theorem l3.3f c) reduces the problem of designing truthful MAB mechanisms to that of designing monotone 
MAB allocations. Let us state this reduction explicitly: 

Theorem 5.1. Consider the stochastic MAB mechanism design problem. Let Abe a stochastically monotone 
(resp., ex-post monotone) MAB allocation rule. Applying the transformation in Theorem \3.3\ c V\ to A with 
parameter p, we obtain a mechanism A4 such that: 

(a) Ai is stochastically truthful (resp., ex-post truthful), ex-post no-positive-transfers, and universally 
ex-post individually rational. 

(b) for each click realization, the difference in expected welfare between A and M is at most pn b,„ax- 

Remark 5.2. The theorem provides two distinct types of guarantees: game-theoretic guarantees in part (a), 
and performance guarantees in part (b). 

Remark 5.3. The performance guarantee in part (b) depends on n, the number of agents. If instead of 
expected welfare one considers regret R{T), the dependence on n can be eliminated, so that the difference 
between A and Ai is at most p. In fact, this result extends to regret (stochastic or ex-post) w.rt. any given 
benchmark set of outcomes O'. To prove this, use (an additive version of) Theorem \3.8\ with OPT redefined 
w.rt. O'; we omit the details. 

We show that a very general class of deterministic MAB algorithms induces monotone MAB allocation 
rules (to which Theorem 15.1 l ean be applied). 

Definition 5.4. In a given run of an MAB algorithm, the round-t statistics is a pair of vectors (vr, u), where 
the i-th component ofn (resp., v) is equal to the total payoff (resp., the number of impressions) of agent i in 
rounds 1 to t — 1, for each agent i. Vectors vr and v are called p-stats vector and i-stats vector, respectively. 

*We define -R(r) with feniax = 1 merely to simplify the notation. All regret bounds (scaled up by a factor of bmax) hold for an 
arbitrary fomax- 

^Theorem[33lc) is stated for T = , but it trivially extends to the case T = (0, bmax)"- 
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Definition 5.5. A deterministic MAB algorithm A is called well-formed if for each round t and each agent 
i, letting (vr, be the round-t statistics, the following properties hold: 

• A\{p) is determined by (vr, v): A\{p) = Xi ^) for any click realization p. 

• [x-tnonotonicity] Xi(^) ^) non-decreasing in TTifor any fixed {i^-i, f). 

• [x-llA] for each round t, any three distinct agents {i,j, 1} and any fixed (vT-j, z^_i), changing (iTi, Ui) 
cannot transfer an impression from j to I. 

The X"^^^ property above is reminiscent of Independence of Irrelevant Alternatives (IIA) property in the 
Social Choice literature (hence the name). A similar but technically different property is essential in the 
analysis of deterministic MAB allocation rules in |[8l . 

Remark 5.6. For a concrete example of a well-formed MAB algorithm, consider (a version of) UCBlJ^ The 
algorithm is very simple: in each round t, it chooses agent 



min (^argmax (Tri{t) / Ui{t) + y^8log{T)/iJ,{t)jj . 

Lemma 5.7. In the stochastic MAB mechanism design problem, let Abe a MAB allocation rule induced by 
a well-formed MAB algorithm. Then A is stochastically monotone. 

Proof. We will use an alternative way to define a realization of random cUck rewards: a stack-realization is 
a.k xT table in which the (i, t) entry is the click bit that agent i receives the t-th time she is played. Cleai^ly 
a stack-realization and a bid vector uniquely determine the behavior of A. We will show that: 

A is monotone for each stack-realization. (13) 

Then A is monotone in expectation over any distribution over stack-realizations, and in particular it is 
monotone in expectation over the random clicks in the stochastic MAB setting, so the Lemma follows. 

Let us prove Claim (fTSl ). Throughout the proof, fix stack-realization a, agent i, and bid vector 
Consider two bids bi < bf. The claim asserts that agent i receives at least as many clicks with bid bf than 
with bid hi. 

Let us introduce some notation. Given stack-realization a and bid vector b = {b^i, bi), let A{bi,t) be 
the agent selected by the allocation rule in round t, and let 1^1(64, t) (resp., 7rj(6j, t)) be the total number of 
impressions (resp., total click reward) of agent i in the first t rounds. Let TTi{bi,t) = (6i/6max) T^iihit) be 
the corresponding total modified click rewards. Note that Vi{bi,t) uniquely determines 7rj(6, t): 

7ri(6i,t) = where u-i = Ui{bi,t). (14) 

Let us overview the forthcoming technical argument. We will show by induction on t that iJi{bi,t) < 
Ui{bf ,t) for all t. For the induction step we only need to worry about the case when the claim holds for a 
given t with equality. In this case we show that v^i{bi,t) = v^i{bf ,t). This is trivial for n = 2 agents; the 
general case requires a rather delicate argument that uses the x-HA property in Definition [530 

*As the algorithm is deterministic, the probability .4*(p) is trivial: either or 1. The function Xi() is the same for all t. 

'To ensure the x-HA property, we use a slightly modified version of UCBl: log T is used instead of log t, and min is used to 
break ties (instead of an arbitrary rule). This change does not affect regret guarantees. We will denote this version as UCBl without 
further notice. 

'"Also, we will use the fact that the probabilities Xji''^^ ^) in Definition 15. 5| do not depend on the round (given j and (tt, v)). 
This is the only place in any of the proof where we invoke this fact. 
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Now let us cany out the proofs in detail. First, denote z^*(6i, t) = t — Ui{bi,t), and let us show that for 
any two rounds t,s it holds that 

u4bi,t) = v*{hj,s) iy-i{bi,t) = i/_i(6+,s). (15) 

Let us use induction on u^{bi,t). For u^{bi,t) = the statement is trivial. For the induction step, 
suppose Eq. (fT5l) holds whenever u^,{bi,t) = u^, and let us suppose iy^{bi,t) = v^{bf,s) = + 1. 
Let t' and s' be the latest rounds such that u^,{bi,t') = i'^{bf,s') = u^. By the induction hypothesis, 
v^i{bi,t') = v-i{bf ,s'). It remains to prove that A{bi,t' + 1) = A{bf ,s' + 1), i.e. that the allocation 
rule's selections in round t' + 1 given bids (6_i, bi), and in round s' + 1 given bids bf), are the same0 
By Definition 15.51 these selections are uniquely determined (given the stack-realization) by the bids and the 
impression counts v. By the choice of t' and s' , neither of the two selections is i, so by the X'HA condi- 
tion in Definition 15.51 the selections are uniquely determined by b^i and z/_j, and hence are the same. This 
proves Eq. ( fT5l ). 

Now, to prove Claim ([T3l) it suffices to show that for all t 

yi{hi,t) <ui{bt,t). (16) 

Let us use induction on t. The claim is trivial for t = 1, since the click bit of agent i in round 1 does not 
depend on (6; a). For the induction step, assume that the assertion Eq. ([T6l ) holds for some t, and let us 
prove it for t + 1. Note that (using the notation from Definition [53]l 

Vi{bi,t + 1) = Vi{bi,t) + Xj(7r(&i, t); u{bi, t)). 

Now, iJi{bi,t) < Vi{bf ,t) by induction hypothesis. If the inequality is strict then Eq. ( fT6l ) trivially holds 
for t + 1. Now suppose Ui{bi,t) = Vi{bf ,t). Then by Eq. (fTSl) we have v{bi,t) = v{bf ,t). Moreover, 
by Eq. (fT4l) we have TT{bi,t) = 7r{bf,t) and therefore 7r_i(6j,t) = TT-i{bf,t) and Tti{bi,t) < TTi{bf,t). 
Thus, by the x-monotonicity property in Definition [53] we have 

xi{7r{bi,ty, u{bi,t)) <xi{TT{bt,ty, u{bt,t)). 

This concludes the proof of Eq. ([T6l) . and that Claim (fT3] ). □ 
5.3 Truthfulness vs regret 

In this subsection we focus on the stochastic MAB setting, and consider the trade-off between regret and 
various notions of truthfulness. Ideally, one would like an MAB mechanism to be truthful in the strongest 
possible sense (universally ex-post), and have the same regret bounds as optimal MAB algorithms. 

Let us start with some background. In lH it was proved that any deterministic mechanism that is ex-post 
truthful and ex-post normalized (under very mild restrictions), and any distribution over such deterministic 
mechanisms, incurs much higher regret than an optimal MAB algorithm such as UCBl. Namely, the lower 
bound in [8] states that R{T) = n{n^^^ T^/^), whereas UCBl has regret R{T) = 0{^/nT log T).^ For 5- 
gap instances the difference is even more pronounced: the analysis in H] provides a polynomial lower bound 
of Rs{T) = n{6T^) for some A > 0, whereas UCBl achieves logarithmic regret Rs{T) = 0{ j log T). 

Our first result is that we can use the machinery from Section [5!2] to match the regret of UCBl for truthful 
mechanisms. We apply Theorem l5. 1 K with fi = j^) and Lemma lSTTl to UCBl to obtain the following corollary: 

"Then i/-i{bi,t) — V-.i{b'^ , s) because in all rounds from t' + 2 to t (resp., from s' + 2 to s) agent i is played. 
'^Following the literature on regret minimization, we are mainly interested in the asymptotic behavior of R{T) as a function of 
T when n is fixed. 
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Corollary 5.8. In the stochastic MAB mechanism design problem, there exists a mechanism M such that 

(a) Ai is stochastically truthful, ex-post no-positive-transfers, universally ex-post individually rational. 

(b) M has regret R{T) = ©(VnTlogT) and 6-gap regret Rs{T) = logT). 

Remark 5.9. The regret and 6-gap regret in the above theorem are within small factors (resp., 0(-v/log T) 
and 0(1)) of the best possible for any MAB allocation rule. 

Remark 5.10. ^/ provides a weaker result which transforms any monotone MAB algorithm such as UCBl 
into a truthful and normalized MAB mechanism with matching regret bounds. The guarantees in jUS^ are 
weaker for the following reasons. First, it only applies to 0-1 click rewards, whereas our setting allows for 
arbitrary click rewards in [0, 1]. Second, the individual rationality guarantee in Ml/ is much weaker: an 
agent may be charged more than her bid (which never happens in our mechanism), and the charge may 
be huge, as high as bi x {An)^ ; thus, a risk-averse agent may be reluctant to participate. Third, the no- 
positive-transfers guarantee is weaker: for some realizations of the click rewards the expected payment may 
be negative. Finally, the payment rule in ^ requires (as stated) a prohibitively expensive computation. 

The truthfulness in Corollary 15 .8 l is only in expectation over the random click rewards. Thus, after seeing 
a specific realization of the rewards an agent might regret having been truthful. Accordingly, we would like 
a stronger property: ex-post truthfulness, i.e. truthfulness /or every given realization of the rewards. 

The main result of this section is an ex-post truthful MAB mechanism with optimal regret bounds. 
Unlike Corollary 15.81 this result requires designing a new MAB allocation rule. This allocation rule and its 
analysis are the main technical contributions. 

Theorem 5.11. In the stochastic MAB mechanism design problem, there is a mechanism M such that 

(a) M is ex-post truthful, ex-post no-positive-transfers, and universally ex-post individually rational. 

(b) M has regret R{T) = 0{^/nT\ogT) and 6-gap regret Rs{T) = 0(f logT). 

The theorem follows from Theorem 15. 1 1 (with /i = t^) if there exists an MAB allocation rule that is 
ex-post monotone and has the claimed regret bounds. To the best of our knowledge, such allocation rule is 
absent in the literature. Below we provide such allocation rule, called NewCB. 

NewCB maintains a set of active agents; initially all agents are active. For each round t, there is a 
designated agent i = 1 -\- {t mod n). If this agent is active, then it is allocated. Else, an active agent is 
chosen at random and allocated. For each agent i, lower and upper confidence bounds (Lj, Ui) on the product 
bi fii are maintained (recall that is the CTR of agent i). After each round, each agent is de-activated if 
its upper confidence bound is smaller than someone else's lower confidence bound. The pseudocode is in 
Algorithm |3l 

Fix realization p and bid vector b. Let S^ctit, b) be the set of active agents after round t. For each agent 
i, let Li(t, b) and J7j(i, 6) be the values of Lj and Ui after round t. 

The goal of the specific update rules for the confidence bounds (lines 15-20) and the statistics (lines 13) 
is to guai^antee the following two properties: 

• the statistics ai^e kept only for rounds when a designated agent is played. Moreover, for each agent i 
and round t, and any two bid vectors b and b' we have 




(17) 
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Algorithm 3 NewCB: ex-post monotone MAB allocation rule. 

1: Given: n = #agents, T = #rounds, upper bound 6max- 

2: Solicit a bid vector b from the agents; h ^ b/bmax- 

3: Initialize: set of active agents Sact = {all agents}. 

4: for all agent i do 

5: Cj 0; Tij ^ {total click reward and #impressions} 

6: {the totals ai^e only over "designated" rounds} 

7: [/j 6j; Lj {Upper and Lower Confidence Bounds} 

8: {Main Loop} 

9: for rounds t = l,2, . . . , T do 

10: i ^ 1 + {t mod n). {The "designated" agent} 

11: if i e 5act then 

12: Allocate agent i. 

13: Tij ^ nj + 1; Ci ^ Ci + reward. {Update statistics.} 

14: {Update confidence bounds.} 

15: if Li < Ui then 

16: m, Ul) ^ bi ia/n, T V81og(r)/ni)). 

17: if max(Lj, L^) < min(C/j, U-) then 

18: {Li, Ui) ^ (max(Li, L^, min([/i, [//)). 

19: else 

20: (L„ U) ^ M). 

21: else 

22: Allocate an agent chosen u.a.r. from Sact- 

23: for all agent i G 5'act do 

24: if Ui < uiayij^s^ct then 

25: Remove i from 5act- 



• for any fixed realization p and bid vector b, and each agent i: Li < U, and from round to round Li is 
non-decreasing and Ui is non-increasing. On other words, for each round t it holds that 

L,{t - 1,6) < Li{t,b) < Ui{t,b) < U{t-l,b). (18) 

The ex-post monotonicity follows from these two properties and the de-activation rule (lines 24-25). 

Ex-post monotonicity. Let L*{t,b) = maxjg^^^^i-^ Li{t,b). Fix agent i and 6^ > bi, and let fe"*" = 
bf ) be the "alternative" bid vector. 

Claim 5.12. We establish the following sequence of claims: 
(CI) L*{t, b) is non-decreasing in t, for any fixed b. 
(C2) For each round t, L*{t,b) < L*{t,b'^). 
(C3) For each round t, Ss,ct{t,b~^) \ {i} C Ss,ctit,b) \ {i}. 
(C4) In each round t: if i S ^actC^j b) then i S 5act(i> 6"^)- 

Proof. Let us prove the parts (C1-C4) one by one. 

(CI). We use Eq. (fTSl ) and the de-activation rule. Throughout the proof, we omit the b. Fix round t > 2. Let 

i G 5'act (i - 1) be an agent such that L* {t - 1) = Li{t - 1). If i e S'act(i) then 

L*it-l)=Li{t-l)<Li{t)<L*it) 
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Else i is de-activated in round t, so 

L*{t - 1) = L,{t - 1) < Li{t) < U^{t) < L*{t). 

(C2). Let us use induction on t. For t = 0, L*{t, •) = 0. Suppose the claim holds for t — 1, and let us it for 
t. Suppose, for the sake of contradiction, that L*(t, b) > L*(t, b'^). Let j G Sactiij b) be an agent such that 
Lj{t, b) = L*{t, h). Then by Eq. ([n]) it follows that j S^ct{t-, b'^)- Thus with bid vector 6+ agent j gets 
disqualified after some round s <t. Thus, using Part (CI), 

?7,(s,6+) < L*{s,b+) < L*{t,b+). 

Now using Eq. ([TSl l and Eq. ([TT] ) (for the right-most inequality), we get that 

L*{t,b) = Lj{t,b) < Uj{t,b) < Uj{s,b) < Uj{s,b+). 
Thus, L*{t, b) < L*{t, the desired contradiction. 

(C3). Use induction on t. The claim trivially holds for t = 1. Assuming the claim holds for some t we 
prove it holds for t + 1. Fix agent j G 5act(i, b^) \ {i}- Then j G S^ctit, b) by the induction hypothesis, so 
by Eq. ^ Uj{t, b) = Uj{t, b+). Moreover, L*{t, b) < L*{t, b+) by Part (C2). Thus, agent j is de-activated 
with bid vector b only if it is de-activated with bid vector 6+. 

(C4). Using Part (C3) and property Eq. (fTTl) . we can show that in each round t we have 

either L*(t, 6+) = L*{t,b) or L*(t,6+) = Li{t,b'^). 

Now let us use induction on t. Suppose i G 5act(i + 1, b). Then Ui{t, b) > L*{t, b), and we need to 
show Ui{t, 6+) > L*{t, b~^). Note that i G S'actCi, b), so i G Sa^ctit, b~^) by the induction hypothesis, and 
so Ui{t,b+) > Ui{t,b) by property Eq. ([HI). Thus, Ui{t,b+) > L*{t,b). Also, Ui{t,b+) > Li{t,b+) by 
property Eq. (fTTl) . Therefore, 

Ui{t,b+) > mm{L*{t,b), Li{t,b+)) > L*{t,b+).n 

Ex-post monotonicity follows easily from (C3-C4). Fix round t and let qi{t,b) be the probability that 
with bid vector 6, agent i receives an impression in this round. 

Claim 5.13. For each round i, qi{t, b) < qi{t, b'^). 

Proof. If qi{t,b) > then agent i is active with bid vector b, and hence (using (C4)) with b'^. If agent i 
is the designated agent in round t, then qi{t,b) = qi{t,b~^) = 1. Else, qi{t, ■) = l/|5act(i> •)!' ^^^^ 
qi{t, b) < qi{t, b+) since by (C3) |5act(t, b+)\ < |5act(t, b)\. □ 

Regret. The regret analysis is relatively standard, following the ideas in Q. For simplicity assume that 
^max = 1- Fix a bid vector b. For each agent i, let Ci{t) and nj(t) be, respectively, the number of clicks 
and impressions in rounds s < t when it is the designated agent. Let ri{t) = 8log{T) / ni{t) . Then 
l/ij — Cj/nj| < ri{t) with probability at least 1 — T~^. In what follows, let us assume that this event holds. 
Then it easily follows from the specs of NewCB that 

r Li{t,b) < bifi, < Uiit,b) 

\ Ui{t,b) - Li{t,b) < AbiHi. 

Let Aj = (maxj bj fij) — bi Hi be the suboptimahty of i. Then each agent i with Aj > gets de-activated 
after 0{k A^"^ log T) rounds, hence the bounds on regret. 
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5.4 The power of randomization 

A by-product of Theorem [57TT] is a separation between the power of deterministic and randomized mecha- 
nisms, in terms of regret for MAB mechanisms that are ex-post truthful and ex-post normalized. The lower 
bound for deterministic mechanisms is from |f8l. 

One challenge here is to ensure that the upper and lower bounds talk about exactly the same problem; as 
stated, Theorem lS.l H and the main lower bound result from ISl do not. To bypass this problem, we focus on 
the case of two agents, and use a more general version of the lower bound: Theorem C.l in the full version 
of im. Further, to match ||8l we extend the mechanism from Theorem 15. 11 1 to a setting in which b^nax is not 
known a priori. 

We formulate the separation theorem as follows. Denote R{T,b„iax) — Taa.yiR{T;b] fi), where the 
maximum is taken over all CTR vectors fi and all bid vectors b such that 6j < ^^ax for all i. 

Theorem 5.14. Consider the stochastic MAB mechanism design problem with two agents. Assume b^ax i^ 
not known a priori to the mechanism. Suppose Ai is an MAB mechanism that is (i) ex-post truthful and 
ex-post normalized, and (ii) has regret R{T, b,„ax) = 0{b„iax T'^)far some 7 and any bumx- Then.' 

(a) /HI/ If A4 is deterministic then 7 > |. 

(b) There exists such randomized A4 with 7 = ^. 

Proof of part (b). Let A' be the ex-post monotone MAB allocation rule in Theorem 15.111 for fen^ax = 1- 
Define an MAB allocation rule ^ as a rule that inputs the bid vector b and passes the modified bid vector 
b' = 6/(maxj bi) to A'. We claim that A is ex-post monotone, too. Indeed, w.l.o.g. assume 61 > 62- If ^2 
increases (to a value < 61), then 62 increases while b[ stays the same. Thus, the total click reward of agent 2 
increases. If bi increases then b'2 decreases while b'^ stays the same, so the total click reward of agent 2 does 
not increase, which implies that the total click rewai^d of agent 1 does not decrease. Claim proved. Now part 
(b) follows from Theorem 15.11 (with fi = □ 

6 Open Questions 

This paper gives rise to a number of open questions, which fall into three directions. The first direction con- 
cerns the "quality" of our general transformation: is it the best possible? One concrete way to phrase this is 
whether it obtains the optimal trade-off between the loss in welfare and the maximal rebate size? For positive 
types, what if no rebates (i.e., positive transfers) are allowed? The second direction concerns the power of 
randomization for mechanism design. We have a separation result for ex-post truthful MAB mechanisms. 
Can one obtain similar separation results for other single-parameter domains? Third, Theorem 15 . 1 1 opens up 
the "monotone MAB allocation design problem", valid for any MAB setting in the literature. Perhaps the 
most interesting question here is whether one can match the optimal regret for the adversarial MAB setting. 
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A Non-recursive self-resampling 



In this section we prove Proposition 13.61 which asserts that RECURSIVE^ and OneShot^ (as defined in 
Algorithm [Hand Algorithm |2j respectively) generate the same output distribution. 

In what follows, we will omit the subscript i from the description of the procedures. That is, a self- 
resampling procedure inputs a scalar bid b and a random seed w, and outputs two numbers {x,y). For 
convenience, let us restate Proposition 13.61 

Proposition A.l. Recursive^ and OneShot^ generate the same output distribution: for any bid b G 
[0, oo), the joint distribution of the pair {x, y) = {x{b; w),y{b; w)) is the same for both procedures. 

The proof of the proposition uses the following well-known fact about exponential distributions which 
we state without a proof: 

Lemma A.2. Let Xi, . . . , Xn be drawn independently, uniformly at random from (0, 1). Let F{x,p) = 
— ln(l — x)/p, and let pi, . . . , pn be positive numbers that sum up to 1. Then each Yi = F{Xi,pi) is an 
exponentially distributed random variable with rate pi. Moreover, Pr\Yi = miiij Yj] = pi. 

Let us restate Recursive^j and OneShot^ so that the random seed is explicit. Both procedures will 
use a random seed w consisting of two infinite sequences /32, . . .) and (71, 72, . . .) such that the 13 j are 
i.i.d. Bernoulli random variables with E(/3j ) = 1 — fi, and the 7^ are i.i.d. uniform random variables in [0, 1] . 
Then the two procedures can be restated as follows: two procedures. 

Recursive^ (6; u;): If /3i = 1, output x{b;w) = y{b;w) = b. Otherwise, let b' = b ■ 71, and let w' 
denote the shifted random seed consisting of the sequences {(32, Ps, ■ ■ ■) and (72,73,...). Output 
x{b; w) = x(6'; w') and y{b; w) = b' . Note that the recursive procedure for calculating x terminates 
as long as /3j = 1 for at least one value of j, an event that has probability 1. 

OneShot^(6; w): If /3i = 1, output x{b; w) = y{b; w) = b. Otherwise, output x{b; w) = b • and 
y(6; w) = b • Tciayi{'^\^ , 72^^}- 

It will be useful to consider the joint distribution of {lx,ly) = {^og{b/x),log{b/y)) when {x,y) = 
{x{b; w), y{b; w)) is sampled using either RECURSIVE^ or OneShot^. From the descriptions of the two 
algorithms, and the fact that log(l/x) is exponentially distributed with mean 1 when x is uniformly dis- 
tributed on [0, 1], we derive the following two sampling procedures for {l^, ly). 

Procedure 1: With probability 1 — /i output Ix = ly = 0. Otherwise, repeatedly sample an independent 
exponential random variable Vi with mean 1 and Bernoulli random variable Wi with mean 1 — fi until 
the first t such that wt = 1. Output = Yll=i ly = vi. 

Procedure 2: With probability 1 — /U output Ix = ly = 0. Otherwise, sample two independent exponential 
random variables ui,U2 with means 1/(1 — /x) and l/fi respectively. Output Ix = ui and ly = 
min(ui, U2). 

Our goal is to prove that Procedures 1 and 2 generate the same joint distribution of {lx,ly)', clearly this 
reduces to proving that the conditional distribution of {Ix, ly) given that {l^, ly) 7^ (0, 0) is the same in both 
cases. To this end, let wi, r;2, . . . denote an infinite sequence of independent exponentially distributed random 
variables, each with mean 1, and let i^i, 1^2, • • • denote an infinite sequence of independent Bernoulli random 
variables, each with mean 1 — fj,. Consider the sequence of partial sums si, S2, • • • given by Sj := Yl]=i ^i- 
Let ri = min{sj : Wi = 1} and r2 = minjsj : Wi = 0}. Conditional on the event that {lx,ly) / (0,0), 
Procedure 1 outputs {Ix, ly) with the same distribution as (ri, min{ri, r2}). Recall that Procedure 2 outputs 
{Ix, ly) = {ui,mm{ui, U2}). Thus, to finish the proof, it suffices to prove that (ri, r2) have the same joint 
distribution as {ui,U2). We prove this fact using a sequence of three observations. 
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1. First, minjri, has the same distribution as minjtii, U2}: both are exponentially distributed with 
mean 1. For min{ri, this is obvious from the definition, for min{ni, it follows from Lemma lA.2[ 

2. Second, conditioning on the value of minjri, r2} and the value of minjui, U2}, the probability that 
min{ri, r2} = ri and the probability that minjui, n2} = ui are both equal to 1 — /i. Once again, for 
ri, r2 this fact is obvious from the definitions, whereas for ui,U2 it is a consequence of Lemma IA!21 
combined with the fact that an exponential random variable u is "memoryless": the conditional dis- 
tribution ofu — q given that u > qis the same as the unconditional distribution of u. (In more detail, 
the memorylessness, together with Lemma IAT21 implies that for all q, the conditional probability that 
min{ui, U2} = ui given minjiii, n2} > qis always equal to 1 — /x. Since this conditional probability 
is independent of q, it must remain the same when we condition on min{ui, U2} = q rather than 
min{ui, U2} > q.) 

3. Third, conditioning on the event that min{ri, r2} = rjforz G {1, 2}, the random variable max{ri, r2}— 
min{ri , r2} is exponentially distributed with mean rrii, where mj = l//iifi = l and rrii = 1/(1 — /i) 

if i = 2. Moreover, the same fact applies with (mi,U2) in place of (ri,r2). In this case, the 
claim is obvious for {ui,U2) because each of them is memoryless and independent of the other. To 
prove the claim for (ri, r2), note from the definition of ri and r2 that the conditional distribution of 
max{ri,r2} — min{ri,r2} is equal to the unconditional distribution of V := Sk — si = Yli=2 ^« 
where the Vi are independent exponential random variables with mean 1, and — 1 is geometrically 
distributed with mean rrii, independent of the sequence {v^f). This sum has expected value rrii, so it 
remains to show that it is exponentially distributed, i.e. memoryless. If V > q then there is exactly 
one value of j such that sj — si < q < Sj+i — si, and k > j. Conditioning on the event V > q and 
on the value of j, we observe the following: the random variables sj^i — q, Uj+2, f j+3, ... are inde- 
pendent and exponentially distributed with mean 1, and the random variable A; — j is geometrically 
distributed with mean m^. Thus, the conditional distribution ofV — q = {sj+i — q) + Z^j+i<£<fc 
is exactly the same as the unconditional distribution of V. Now removing the conditioning on j, we 
see that the conditional distribution ofV — q given the event V > qis the same as the unconditional 
distribution of V, i.e. V is memoryless, as desired. 
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